Special Note for R Markdown File Only

Special Note: While this R Markdown files serves as the source for all of the materials found in this project’s writeup, this file has a slightly different purpose and will not follow the paper’s order. Essentially, I want users to be able to follow this file as if it were a tutorial. As such, most of the code will be viewable, I will create more data frames than necessary so the user can inspect differences, and the order of creation will be different to similar visualizations together. To see the intended narrative and order for story purposes, read the writeup.

Abstract

The purpose of this analysis was to examine the spatial characteristics of Chicago crime data and to explore relationships between the density of that crime, proximity to locations of interest (Police and Transit Stations), and community demographics in order to inform further investment in police infrastructure. We also examined some of the temporal characteristics of crime to inform resource management/deployment. Our initial hypothesis was as follows:

  1. Crime rates (as measured by count/km2) decrease as one moves closer to a police department
    • This assumes that proximity equates to higher police presence which acts as a deterrent.
    • Alternatively, police may be more likely to make an arrest for a crime where they most regularly patrol, resulting in increased activity as one gets closer to a police station. This would cause us to reject this section of our hypothesis.
  2. Crime rates increase as one moves closer to a transit station.

The data from 2013-01-01 through 2014-12-31 actually indicated that crime rates do not decrease but actually increase as one moves closer to a police station, causing us to reject part 1 of the hypothesis. However, crime rates do increase exponentially as one approaches a transit station, supporting section two of our initial hypothesis. Our final recommendation as it concerns police infrastructure can be found in the conclusion.


Introduction

The “client” has requested analytic support to determine how to best invest in new police infrastructure. The purpose of this analysis is to determine significant spatial, temporal, and demographic crime patterns that could be used to guide investments and/or improve policing patterns. We will examine Chicago Crime Data from several angles and utilize multiple visual methods including the following:

Our initial, spatial hypothesis:

  1. Crime rates (as measured by count/km2) decrease as one moves closer to a police department
    • This assumes that proximity equates to higher police presence which acts as a deterrent.
      • Note: for the sake of simplicity we assume that each police department employs the same number of officers. This could be tightened with actual weights if we wish to extend our engagement.
    • Alternatively, police may be more likely to make an arrest for a crime where they most regularly patrol, resulting in increased activity as one gets closer to a police station. This would cause us to reject this section of our hypothesis.
  2. Crime rates increase as one moves closer to a transit station.

Assuming both parts of the hypothesis prove true and the spatial relationships are strong, we will likely suggest building small, permanent outposts in the city’s most crime dense communities, at their most affected transit stations. Such a strategy could reduce crime rates and place our force in the most best position to affect crime in their communities.

R Packages for Spatial Data Analysis

Here are links to a few of the spatial and graphical packages used herein:

The Data

Background

The superset of data is a combined set of files including Chicago crime data, geo-spatial polygons, Mass transit station locations, police department locations, demographics, Categorization tables, weather data, and more. All requisite data files can be found in the appropriate folder on Github. While the crime data currently covers dates from 2013-01-01 to 2014-12-31, this could be easily updated at will. Currently, I have intentionally left 2015 data out, as I am using it as test data for a separate analysis.

Sources

Base crime data, shape files, coordinate files, and census data comes from the data repository at the City of Chicago website. Although not used in this iteration of analysis, weather data was downloaded from the Weather Underground. This analysis also uses a community-census tract mapping table courtesy of Rob Paral. The base sources are then subsequently scrubbed, reformatted, joined, etc. in R. This can be viewed via the RMarkdown file “CreateSuperDataSet.”

General Data Structure Notes

Each tuple in the base crime dataset equates to 1 crime. While the base crime data is robust, this analysis required a decent amount of “munging,” mutating, cleansing, etc. It does includes some null fields, duplications where a given crime may have had more than one victim, and some erroneous location data such as crimes occurring within a police station, outside of city limits, or within a different state, etc . Some of those characteristics, like duplications for more than one victim, are appropriate and left alone. Missing/Erroneous spatial data is dropped. You can see more detail and follow along via the appropriate file on Github. Most of the Demographic data is provided by the city of Chicago’s Department of Public Health as underlying pieces to their “Hardship Index”. It includes the following:

  • Percent of crowded housing
  • Percent of households below poverty
  • Unemployment rate, 16+
  • Percent aged 25+ without a high school diploma
  • Percent aged under 18 or 65+
  • Per capita income

In order to keep the files robust, but manageable, I have created categories for each primary type of crime. These Meta Categories were created by grouping crime descriptions as closely to federal definitions as possible. Categories are as follows:

  • Violent Crimes
    • Assault
    • Battery
    • Criminal Sexual Assault
    • Homicide
    • Etc.
  • Non-Violent Property Crime
    • Arson
    • Burglary (ex Home invasion)
    • Motor Vehicle Theft
    • Etc.
  • Financial Crime/Fraud
    • Deceptive Practices (embezzlement, etc.)
    • Intimidation - Extortion
    • Money Laundering
    • Etc.
  • Other
    • Traffic violatons
    • Noise violations
    • Non-violent prostitution
    • Non-violent narcotic violations
    • Etc.

Note: The full map is viewable in Github. This analysis displays all 4 types in initial plots but is trimmed to just Violent Crimes and Non-Violent Property Crime for all mapping exercises.

Hour Categories are split into 4 hour increments as follows:

  • 7am - 11am is “AM Commute”
  • 11am - 3pm is “Lunch”
  • 3pm - 7pm is “PM Commute”
  • 7pm to 11pm is “Evening”
  • 11pm - 3am is considered “Night”

Load initial libraries

Read in the Superset of data

setwd("D:/Users/Jonathan Morken/Documents/GitHub/Capstone Project/")
crime_df_no_NAs <- read.csv("data/JoinedDataWithDist.csv", sep = ",", header = TRUE)

We need to cut extraneous columns and change the format of the following variables: * Make “Beat” a factor
* Make “District” a factor
* Add order to Hour.Category * Add order to Day of Week factor

#Remove unnecessary columns
crime_df_no_NAs <- select(crime_df_no_NAs, ID, Meta.Category, Primary.Type, Beat, District, Community.Name, PERCENT.OF.HOUSING.CROWDED, PERCENT.HOUSEHOLDS.BELOW.POVERTY, PERCENT.AGED.16..UNEMPLOYED, PERCENT.AGED.25..WITHOUT.HIGH.SCHOOL.DIPLOMA, HARDSHIP.INDEX, PER.CAPITA.INCOME, Total.2010, Year, Month, DayinMonth, DayofWeek, Hour, Hour.Category, Latitude.Crime, Longitude.Crime, Closest.District, PD.Dist.km, Transit.index, Transit.Dist.km)

#Fix column name formats for the sake of standardization.
crime_df_no_NAs <- rename(crime_df_no_NAs, Percent.of.Housing.Crowded = PERCENT.OF.HOUSING.CROWDED)
crime_df_no_NAs <- rename(crime_df_no_NAs, Percent.Households.Below.Poverty = PERCENT.HOUSEHOLDS.BELOW.POVERTY)
crime_df_no_NAs <- rename(crime_df_no_NAs, Percent.Aged.16.Unemployed = PERCENT.AGED.16..UNEMPLOYED)
crime_df_no_NAs <- rename(crime_df_no_NAs, Percent.Aged.25.Without.Hish.School.Diploma = PERCENT.AGED.25..WITHOUT.HIGH.SCHOOL.DIPLOMA)
crime_df_no_NAs <- rename(crime_df_no_NAs, Hardship.Index = HARDSHIP.INDEX)
crime_df_no_NAs <- rename(crime_df_no_NAs, Per.Capita.Income = PER.CAPITA.INCOME)
crime_df_no_NAs <- rename(crime_df_no_NAs, Census.Population = Total.2010)

#Change formats
crime_df_no_NAs$Beat <- as.factor(crime_df_no_NAs$Beat)
crime_df_no_NAs$District <- as.factor(crime_df_no_NAs$District)

#Make Hour Category an ordered factor and order them appropriately
print(levels(crime_df_no_NAs$Hour.Category))
## [1] "AM Commute" "Evening"    "Lunch"      "Night"      "PM Commute"
## [6] "Pre Dawn"
crime_df_no_NAs$Hour.Category <- factor(crime_df_no_NAs$Hour.Category, levels(crime_df_no_NAs$Hour.Category)[ c(6,1,3,5,2,4)])

#Make DayofWeek an ordered factor
print(levels(crime_df_no_NAs$DayofWeek))
## [1] "Fri"   "Mon"   "Sat"   "Sun"   "Thurs" "Tues"  "Wed"
crime_df_no_NAs$DayofWeek <- factor(crime_df_no_NAs$DayofWeek, levels(crime_df_no_NAs$DayofWeek)[ c(2,6,7,5,1,3,4)])

#Calculate Number of Years in data for use in annualized metrics.
Num_Years_in_Data <- as.integer(length(unique(crime_df_no_NAs$Year)))

#View data structure
str(crime_df_no_NAs)
## 'data.frame':    573280 obs. of  25 variables:
##  $ ID                                         : int  9936131 9923481 9913643 9912871 9909833 9909835 9909901 9910992 9909832 9934360 ...
##  $ Meta.Category                              : Factor w/ 4 levels "Financial Crime/Fraud",..: 1 1 2 1 4 3 4 4 4 3 ...
##  $ Primary.Type                               : Factor w/ 32 levels "ARSON","ASSAULT",..: 9 9 4 9 3 12 2 3 2 7 ...
##  $ Beat                                       : Factor w/ 274 levels "111","112","113",..: 273 100 190 83 165 179 61 10 157 265 ...
##  $ District                                   : Factor w/ 24 levels "1","2","3","4",..: 23 8 16 7 14 15 6 1 12 23 ...
##  $ Community.Name                             : Factor w/ 77 levels "Albany Park",..: 33 4 58 24 76 6 5 49 43 9 ...
##  $ Percent.of.Housing.Crowded                 : num  14.8 4 4.1 3.8 2.3 6.3 4 1.3 9.6 10.8 ...
##  $ Percent.Households.Below.Poverty           : num  33.9 10.4 11.6 46.6 14.7 28.6 27.6 13.8 25.8 18.7 ...
##  $ Percent.Aged.16.Unemployed                 : num  17.3 11.7 12.6 28 6.6 22.6 28.3 4.9 15.8 14.6 ...
##  $ Percent.Aged.25.Without.Hish.School.Diploma: num  35.4 17.7 19.3 28.5 12.9 24.4 18.5 7.4 40.7 37.3 ...
##  $ Hardship.Index                             : int  85 37 35 94 10 73 74 7 76 70 ...
##  $ Per.Capita.Income                          : int  13781 23482 24336 11888 43198 15957 15528 59077 16444 15461 ...
##  $ Census.Population                          : int  56323 41081 64124 30654 82236 98514 48743 21390 35769 78743 ...
##  $ Year                                       : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
##  $ Month                                      : int  12 12 12 12 12 12 12 12 12 12 ...
##  $ DayinMonth                                 : int  31 31 31 31 30 30 30 30 30 30 ...
##  $ DayofWeek                                  : Factor w/ 7 levels "Mon","Tues","Wed",..: 3 3 3 3 2 2 2 2 2 2 ...
##  $ Hour                                       : int  0 0 0 0 23 23 23 23 23 23 ...
##  $ Hour.Category                              : Factor w/ 6 levels "Pre Dawn","AM Commute",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ Latitude.Crime                             : num  41.9 41.7 42 41.8 41.9 ...
##  $ Longitude.Crime                            : num  -87.7 -87.7 -87.8 -87.6 -87.7 ...
##  $ Closest.District                           : Factor w/ 23 levels "1","10","11",..: 5 21 7 19 5 6 19 1 2 15 ...
##  $ PD.Dist.km                                 : num  3.24 3.53 2.64 1.01 2.16 ...
##  $ Transit.index                              : int  244 207 167 19 103 82 44 300 101 213 ...
##  $ Transit.Dist.km                            : num  2.25 5.01 2.28 1.49 1.19 ...

Proximity Analysis (Section 5 in the writeup)

Effects of police presence on crime rates

Contrary to our initial hypothesis, crime rates do not decrease as one gets closer to a given police department. While the effects are not extreme, crime rates actually appear to increase:

Note: The increasing crime density table and graph images can be found in the image folder in the Github project.

#Graph Meta.category vs. Distance to closest PD
g1 <- ggplot(data = crime_df_no_NAs,aes(x = PD.Dist.km)) +
  scale_x_continuous() +
  scale_y_continuous(labels=comma) + 
  geom_hline(yintercept=0, size=0.4, color="black") +
  geom_histogram(breaks=seq(0, 5, by = 0.25), fill="#c0392b", alpha=0.75) +
  scale_color_brewer(palette="Set1", name = "Meta.Category") + 
  facet_wrap(~Meta.Category) +
  labs(title = paste('Crime Proximity to Police Departments'),x = "Kilometers", y = "Occurrences") + 
  theme_light()

#Code to extract figures behind plot for increasing crime density analysis. Note remove the facet wrap layer on the above for this first.
#head(ggplot_build(g1))

Some of the increase in the first ¼ kilometer may be due in part to incorrect location entries such as crimes occurring within police stations. However, the overall slope is meaningful. While area shrinks 98% from a 2km radius to a ¼ km, density increases by 67%. This would seem to support the alternative hypothesis that police presence does not have a dampening effect on crime, but that arrests actually increase with police patrols.

Mass Transit Locations (as represented by the L)

Here the initial hypothesis appears to be correct. Crime rate (density) clearly increases as one moves closer to a given transit station. Again, while area decreases 98% from a 2km radius to just a ¼ km, density increases by 985%. This effect starts to flatten out beyond 2 square kilometers.

Note: The increasing crime density table and graph images can be found in the image folder in the Github project.

#Meta.category and Transit Location
g2 <- ggplot(data = crime_df_no_NAs,aes(x = Transit.Dist.km)) +
  scale_x_continuous() +
  scale_y_continuous(labels=comma) + 
  geom_hline(yintercept=0, size=0.4, color="black") +
  geom_histogram(breaks=seq(0, 5, by = 0.25), fill="lightblue", alpha=0.75) +
  scale_color_brewer(palette="Set1", name = "Meta.Category") + 
  facet_wrap(~Meta.Category) +
  labs(title = paste('Crime Proximity to Transit Stations'),x = "Kilometers", y = "Occurrences") + 
  theme_light()

#Code to extract figures behind plot for increasing crime density analysis. Note remove the facet wrap layer on the above for this first.
#head(ggplot_build(g2))

This could be further refined with transit station volume, but that data is not currently available. Regardless, it is clear that any future infrastructure investments must account for a city’s transit characteristics.

Temporal Analysis

Examine daily distribution of crime

Note: This occurs later in the writeup.

Before we get to regression models, it may also be useful to examine some of the temporal characteristics of Chicago crime. First, it does not appear that crime substantially differs by day of week. Here are graphs showing all 4 categories.

Note: The lack of statistical significance was corroborated by linear models not shown here.

#Create grouped data set.
crime_by_dayofweek <- subset(crime_df_no_NAs,!is.na(DayofWeek)) %>%
  group_by(Meta.Category,Year,Month,DayinMonth, DayofWeek) %>%
  summarise(Count = n())

#Hour Category
d1 <- ggplot(data = crime_by_dayofweek,aes(x = DayofWeek, y = Count)) +
  scale_y_continuous(labels=comma) + 
  geom_bar(stat = "identity", fill="lightgreen", alpha=0.75)+
  scale_color_brewer(palette="Set1", name = "Meta.Category") +
  facet_wrap(~Meta.Category) +
  labs(title = paste('Crime by Day of Week'), x = "Day of Week", y = "Occurrences") + 
  theme_light()

#pdf('Images/d1.pdf')
d1

#dev.off()

The following boxplots display greater detail for the Violent Crimes and Non-Violent Property Crime categories. It is interesting to note that property crimes fall on Saturday and Sunday while violent crimes rise.

d2 <- ggplot(data = subset(crime_by_dayofweek,Meta.Category %in% c("Violent Crimes","Non-Violent Property Crime")),aes(x = DayofWeek, y = Count, colour = Meta.Category)) +
  scale_y_continuous(labels=comma) + 
  geom_boxplot(aes_string(colour="Meta.Category", fill="Meta.Category"), alpha=0.75)+
  scale_color_brewer(palette="Set1", name = "Meta.Category") +
  facet_wrap(~Meta.Category) +
  labs(title = paste('Crime by Day of Week'), x = "Day of Week", y = "Occurrences") + 
  theme_light() +
  theme(legend.position = "none") 

#pdf('Images/d2.pdf')
d2

#dev.off()

Examine hourly distribution of crime

Unlike day of week, there are large variances in crime frequency between hour categories:

#Create grouped data set.
crime_by_hour_category <- subset(crime_df_no_NAs,!is.na(Hour.Category)) %>%
  group_by(Meta.Category,Year,Month,DayinMonth,Hour.Category) %>%
  summarise(Count = n())

#Hour Category
h1 <- ggplot(data = crime_by_hour_category,aes(x = Hour.Category, y = Count)) +
  scale_y_continuous(labels=comma) + 
  geom_bar(stat = "identity", fill="lightgreen", alpha=0.75)+
  coord_flip() +
  scale_color_brewer(palette="Set1", name = "Meta.Category") +
  facet_wrap(~Meta.Category) +
  labs(title = paste('Daily Crime per Hour Category'), x = "Hour Category", y = "Occurrences") + 
  theme_light()

#pdf('Images/h1.pdf')
h1

#dev.off()

The following boxplots display greater detail for the Violent Crimes and Non-Violent Property Crime categories. Note how property crimes peak over lunch and the PM commute, while violent crimes are more common later in the day. This distribution is statistically significant and will be explored more in the regression models below.

h2 <- ggplot(data = subset(crime_by_hour_category,Meta.Category %in% c("Violent Crimes","Non-Violent Property Crime")),aes(x = Hour.Category, y = Count, colour = Meta.Category)) +
  scale_y_continuous(labels=comma) + 
  geom_boxplot(aes_string(colour="Meta.Category", fill="Meta.Category"), alpha=0.75)+
  scale_color_brewer(palette="Set1", name = "Meta.Category") +
  facet_wrap(~Meta.Category) +
  labs(title = paste('Daily Crime by Hour Category'), x = "Hour Category", y = "Occurrences") + 
  theme_light() +
  theme(legend.position = "none") 

#pdf('Images/h2.pdf')
h2

#dev.off()

Mapping the data

From here on, We will move toward mapping the data.

Read in community spatial polygons

library(rgdal)
library(geosphere)

Chicago_shp <- readOGR(dsn = "data/Boundaries - Community Areas (current)", layer = "geo_export_70e8c7f3-e054-4acc-b740-ee8916d5fa41")
## OGR data source with driver: ESRI Shapefile 
## Source: "data/Boundaries - Community Areas (current)", layer: "geo_export_70e8c7f3-e054-4acc-b740-ee8916d5fa41"
## with 77 features
## It has 9 fields
class(Chicago_shp)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
#Add in Area for each community in km^2
Chicago_shp@data$Poly_Area <- areaPolygon(Chicago_shp)/1000000 #Division required to turn into square km
Chicago_shp@data <- rename(Chicago_shp@data, Community.Name = community)
Chicago_shp@data <- rename(Chicago_shp@data, Area.Number = area_numbe)

Calculate community crime density (count/km^2) for violent crimes and property crimes.

#Subset the data to just Violent Crime and Non-Violent Property Crime
crime_subset_df <- subset(crime_df_no_NAs,Meta.Category %in% c("Violent Crimes","Non-Violent Property Crime"))

#Create data frame by community
crime_by_community <- crime_subset_df %>%
  group_by(Meta.Category,Community.Name) %>%
  summarise(Count = n())

#Convert to uppercase to match shape file
crime_by_community$Community.Name = toupper(crime_by_community$Community.Name)

#Join shape area data to crime_by_community and create density variable. We later do this join in the opposite direction to show how to do the same thing but for mapping purposes.  Only the second method would actually be required as you can access shape file data using the @ symbol.
crime_by_community <- left_join(crime_by_community,Chicago_shp@data, by = "Community.Name")
crime_by_community <- as.data.frame(crime_by_community)
crime_by_community <- select(crime_by_community, Meta.Category, Community.Name, Count, Poly_Area)
crime_by_community <- mutate(crime_by_community, Density = (Count / Poly_Area)/Num_Years_in_Data)  #Division is to annualize figure

Plot community crime density (count/km^2) for violent crimes and property crimes.

Note: This comes after the Choropleth in the writeup

As an addendum to the above map, the following bar graph shows the 20 communities with the highest density ratings. It is ordered by a community’s overall crime density and then split out to show Non-Violent Property Crime separate from Violent Crimes.

#Plot crime by community by density
Comms_to_show <- c("LOOP","NEAR NORTH SIDE","SOUTH SHORE","WEST GARFIELD PARK",
                   "WEST ENGLEWOOD","ENGLEWOOD","EAST GARFIELD PARK","NORTH LAWNDALE","CHATHAM","AUSTIN",
                   "LAKE VIEW","ROGERS PARK","AUBURN GRESHAM","GRAND BOULEVARD","WASHINGTON PARK",
                   "HUMBOLDT PARK","WOODLAWN","GREATER GRAND CROSSING","WEST TOWN","CHICAGO LAWN")

g3 <- ggplot(data = subset(crime_by_community,Community.Name %in% Comms_to_show),aes(reorder(Community.Name, Density), y = Density, fill = Density)) +
  scale_y_continuous(labels=comma) + 
  scale_fill_gradient(low = "lightblue", high = "red", guide = "colourbar") +
  geom_bar(stat = "identity", alpha=0.75)+
  coord_flip() +
  facet_wrap(~Meta.Category) +
  labs(title = paste('Top 20 Annualized Community Crime Rates by km^2'), x = "Community", y = "Occurrences") +   theme_light()

#pdf('Images/g3.pdf')
g3

#dev.off()

Note: Some communities (like the Loop and the Near North Side) are very dense but markedly one-sided in the make-up of their crime. Both of these communities are relatively affluent with per capita incomes of $65,526 and $88,669, respectively. Demographic based analysis might see their crime rates as anomalous; however, they have a very large number of accessible mass transit locations within their communities. As we will see later, this at least partially explains their very high crime rates.

Community Per Capita Income Number of Closely Accessible Transit Stops
Loop $65,526 10
Near North Side $88,669 10
Median $20,956 4

Limit the data to max community boundaries:

summary(Chicago_shp)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min       max
## x -87.94011 -87.52414
## y  41.64454  42.02304
## Is projected: FALSE 
## proj4string : [+proj=longlat +ellps=WGS84 +no_defs]
## Data attributes:
##   Area.Number    comarea_   shape_area          perimeter
##  1      : 1   Min.   :0   Min.   : 16913961   Min.   :0  
##  10     : 1   1st Qu.:0   1st Qu.: 49769639   1st Qu.:0  
##  11     : 1   Median :0   Median : 79635753   Median :0  
##  12     : 1   Mean   :0   Mean   : 83614533   Mean   :0  
##  13     : 1   3rd Qu.:0   3rd Qu.: 98853168   3rd Qu.:0  
##  14     : 1   Max.   :0   Max.   :371835608   Max.   :0  
##  (Other):71                                              
##         Community.Name   comarea_id   shape_len        area_num_1
##  ALBANY PARK   : 1     Min.   :0    Min.   : 18138   1      : 1  
##  ARCHER HEIGHTS: 1     1st Qu.:0    1st Qu.: 31949   10     : 1  
##  ARMOUR SQUARE : 1     Median :0    Median : 43229   11     : 1  
##  ASHBURN       : 1     Mean   :0    Mean   : 44398   12     : 1  
##  AUBURN GRESHAM: 1     3rd Qu.:0    3rd Qu.: 49478   13     : 1  
##  AUSTIN        : 1     Max.   :0    Max.   :173626   14     : 1  
##  (Other)       :71                                   (Other):71  
##       area     Poly_Area     
##  Min.   :0   Min.   : 1.571  
##  1st Qu.:0   1st Qu.: 4.624  
##  Median :0   Median : 7.398  
##  Mean   :0   Mean   : 7.768  
##  3rd Qu.:0   3rd Qu.: 9.183  
##  Max.   :0   Max.   :34.545  
## 
bbox(Chicago_shp)
##         min       max
## x -87.94011 -87.52414
## y  41.64454  42.02304
#Limit the data to max community boundaries
crime_df_no_NAs <- subset(crime_df_no_NAs, 
                         -87.94011 <= Longitude.Crime & Longitude.Crime <= -87.52414 &
                          41.64454 <= Latitude.Crime & Latitude.Crime <= 42.02304)

#Subset the data again for standard deviation ellipse shapes below.
violent_crime_subset <- subset(crime_df_no_NAs,Meta.Category %in% c("Violent Crimes"))
prop_crime_subset <- subset(crime_df_no_NAs,Meta.Category %in% c("Non-Violent Property Crime"))

Convert to spatial points dataframe and pull in police and transit station coordinates.

library(sp)

# Convert to SpatialPointsDataFrame with longitude and latitude

#Crime data - all
coords <- SpatialPoints(crime_subset_df[, c("Longitude.Crime","Latitude.Crime")])
crime_spatial_df <- SpatialPointsDataFrame(coords, crime_subset_df)
proj4string(crime_spatial_df) <- CRS("+proj=longlat +ellps=WGS84")

#Crime data - for violent crime ellipse below
violent_coords <- SpatialPoints(violent_crime_subset[, c("Longitude.Crime","Latitude.Crime")])
violent_crime_spatial_df <- SpatialPointsDataFrame(violent_coords, violent_crime_subset)
proj4string(violent_crime_spatial_df) <- CRS("+proj=longlat +ellps=WGS84")

#Crime data - for property crime ellipse below
prop_coords <- SpatialPoints(prop_crime_subset[, c("Longitude.Crime","Latitude.Crime")])
prop_crime_spatial_df <- SpatialPointsDataFrame(prop_coords, prop_crime_subset)
proj4string(prop_crime_spatial_df) <- CRS("+proj=longlat +ellps=WGS84")

#Pull in Mass Transit and Police Department Locations
Mass_Transit <- read.csv("Data/CTAListofLStopsWithPDDistance.csv", sep = ",", header = TRUE)
Mass_Transit <- Mass_Transit[,c(1,2,3,4,5,6,7,9,12)]
#Transit_Coord <- Mass_Transit

Police_Dps <- read.csv("Data/PoliceStations.csv", sep = ",", header = TRUE)
Police_Dps <- Police_Dps[,c(1,6,7)]

Calculate mean locations and standard deviations for confidence interval ellipses

#Create Ellipse Data frame
df_ellipse <- data.frame(prop_crime_spatial_df@coords, group="Non-Violent Property Crime")
df_ellipse <- rbind(df_ellipse, data.frame(violent_crime_spatial_df@coords, group="Violent Crimes"))

#Create points for Ellipses
library(ellipse)
df_ell <- data.frame()

for(g in levels(df_ellipse$group)){
df_ell <- rbind(df_ell, cbind(as.data.frame(with(df_ellipse[df_ellipse$group==g,], ellipse(cor(Longitude.Crime, Latitude.Crime), 
                                         scale=c(sd(Longitude.Crime),sd(Latitude.Crime)), 
                                         centre=c(mean(Longitude.Crime),mean(Latitude.Crime)),level = 0.68))),group=g))
}

Foritfy spatial data

ggplot2 works with data frames and not objects of class Spatial*. So we have to convert the shapes file for those in particular using the fortify() function from ggplot2.

Chicago_shp_df <- fortify(Chicago_shp)

Plot crime by category along with 68% confidence ellipse

Create background layer Note: The ideal zoom setting would be 10.5, but that is not possible.

## Use "get_map" command to download the images and format them for plotting
map_dat <- get_googlemap(center = c(lon = -87.723900, lat = 41.839530), zoom = 10,
  maptype = "road", language = "en-EN", sensor = FALSE, messaging = FALSE,
  urlonly = FALSE)

Use “ggmap” command to make the background plot

ChicagoMap <- ggmap(map_dat, extent = "device", legend = "topleft")
ChicagoMap

Draw 68% confidence interval on maps for both Violent and Property Crimes.

CMap_ellipse_all <- ChicagoMap + 
  geom_point(data = df_ellipse,aes(x = Longitude.Crime, y = Latitude.Crime, colour = group),alpha = .007) +
  geom_path(data=df_ell, aes(x=x, y=y), size=1, col = "white", linetype=2)+
  facet_wrap(~group)+
  theme(panel.margin = unit(1, "lines"))

#pdf('Images/ellipses.pdf')
CMap_ellipse_all +guides(colour=FALSE)

#dev.off()

Zoom Map in a little for heatmaps.

map_dat <- get_googlemap(center = c(lon = -87.723900, lat = 41.839530), zoom = 11,
  maptype = "road", language = "en-EN", sensor = FALSE, messaging = FALSE,
  urlonly = FALSE)

ChicagoMap <- ggmap(map_dat, extent = "device", legend = "topleft")

#Add Shapefile
ChicagoMap2 <- ChicagoMap + 
  geom_polygon(aes(x = long, y = lat, group = group), data = Chicago_shp_df, colour = "black", fill = NA, size = .2)

Heatmaps - Overall View of Chicago

Create base heatmap for Violent Crimes and Property Crimes.

ChicagoHeatMap <- ChicagoMap2 +
  stat_density2d(aes(x = Longitude.Crime, 
                         y = Latitude.Crime, 
                         fill = ..level.., 
                         alpha = ..level..),
                     na.rm = TRUE,
                     size = .05,
                     bins = 12,
                     data = crime_subset_df, 
                     geom = "polygon",
                     colour = "grey") +
      
      ## Configure the scale and panel
      scale_fill_continuous(low = "red", high = "darkred", guide = "colourbar") +
      scale_alpha(range = c(.1,.6)) +
      
      ## Title and labels    
      labs(x = "Longitude", y = "Latitude") +
      ggtitle(paste("Crimes in/around Chicago")) +
      
      ## Other theme settings
      theme(
        plot.title = element_text(size = 26, face = 'bold', vjust = 2),
        axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        legend.position = "none",
        strip.background = element_rect(fill = 'grey80'),
        strip.text = element_text(size = 18)
      )

Plot Heatmaps

Base Map

The following is an overall image of Non-violent Property Crime and Violent Crimes around Chicago. Property crime is highly focused downtown with a few moderate hot spots elsewhere. On the other hand, violent crime is widely dispersed with multiple hot zones.

#pdf('Images/heatmaps.pdf')
ChicagoHeatMap +
  facet_wrap(~Meta.Category)+
  theme(panel.margin = unit(1, "lines"))+
  guides(colour=FALSE)

#dev.off()

Add Police Stations

First, create circle function for radius drawings

# create circles data frame from the PD data frame
make_circles <- function(pointlist, radius, nPoints = 100){
    # radius: radius is measured in kilometer
    meanLat <- mean(pointlist$Latitude)
    # length per longitude changes with Latitude, so need correction. 
    # Also, note that 1 degree of latitude is the same as 111 kilometers.
    radiusLon <- radius /111 / cos(meanLat/57.3) 
    radiusLat <- radius / 111
    circleDF <- data.frame(ID = rep(pointlist[,c(1)], each = nPoints))
    angle <- seq(0,2*pi,length.out = nPoints)

    circleDF$lon <- unlist(lapply(pointlist$Longitude, function(x) x + radiusLon * cos(angle)))
    circleDF$lat <- unlist(lapply(pointlist$Latitude, function(x) x + radiusLat * sin(angle)))
    return(circleDF)
}

# Create data frame for circles around PDs
PDCircles <- make_circles(Police_Dps, 2)

If our null hypothesis regarding proximity to police departments is correct, we should see dips in a localized heat map around police markers. The following heat map shows police departments as markers with 2km radius zones drawn around them. Close inspection reveals that PDs are typically not located at peaks. However, they also don’t exhibit any strong reductive effects on crime rates in their immediate, surrounding areas. This map isn’t detailed enough to reject the first piece of our null hypothesis, but it doesn’t look promising.

Note: We used a 2 km radius assuming that any proximity effects would be de minimis beyond that range (if there are any at all).

Graph:

ChicagoHeatMapPD <- ChicagoHeatMap + 
  geom_point(aes(x = Longitude, y = Latitude), data = Police_Dps, color="blue", size = 3, alpha = .5) +
  geom_polygon(data = PDCircles, aes(lon, lat, group = ID), color = "blue", alpha = .1, fill = "blue")

#pdf('Images/heatmapwithPD.pdf')
ChicagoHeatMapPD #+

#  facet_wrap(~Meta.Category)+
#  theme(panel.margin = unit(1, "lines"))+
#  guides(colour=FALSE)
#dev.off()

Replace Police Stations with Transit Stations

A transit station overlay (as represented by the L) shows a great deal of overlap between common transportation avenues and crime rates. Again, this is not enough to confirm the hypothetical relationship between transit proximity and spatial crime rates, but it does provide a promising initial picture.

Note: Although we do not have the data to confirm it, we suspect that this relationship would only be reinforced by adding bus routes:

ChicagoHeatMapTrans <- ChicagoHeatMap + 
  geom_point(aes(x = Longitude, y = Latitude), data = Mass_Transit, color="darkgreen", size = 3, alpha = .5) 

#pdf('Images/heatmapwithTrans.pdf')
ChicagoHeatMapTrans #+

#  facet_wrap(~Meta.Category)+
#  theme(panel.margin = unit(1, "lines"))+
#  guides(colour=FALSE)
#dev.off()

Crime by Community Choropleth

Create base data for Choropleth

#Add density to Chicago_shp file

#Create data frame by community
crime_by_community2 <- crime_subset_df %>%
  group_by(Community.Name) %>%
  summarise(Count = n(),
            Num.Close.Transit = n_distinct(Transit.index),
            Percent.of.Housing.Crowded = median(Percent.of.Housing.Crowded),
            Percent.Households.Below.Poverty = median(Percent.Households.Below.Poverty),
            Percent.Aged.16.Unemployed = median(Percent.Aged.16.Unemployed),
            Percent.Aged.25.Without.Hish.School.Diploma = median(Percent.Aged.25.Without.Hish.School.Diploma),
            Census.Population = median(Census.Population),
            Hardship.Index = median(Hardship.Index),
            Per.Capita.Income = median(Per.Capita.Income))

#Convert to uppercase to match shape file
crime_by_community2$Community.Name = toupper(crime_by_community2$Community.Name)

#Join shape area data to crime_by_community and create density variable
Chicago_shp@data <- left_join(Chicago_shp@data, crime_by_community2, by = "Community.Name")
Chicago_shp@data <- mutate(Chicago_shp@data, Density = (round((Count / Poly_Area), 2))/Num_Years_in_Data)
Chicago_shp@data <- mutate(Chicago_shp@data, People.Per.km = (round((Census.Population / Poly_Area), 2)))

#Create slimmer version for choropleth:
choro <- Chicago_shp[,c("Area.Number","Community.Name","Density","Num.Close.Transit","People.Per.km","Percent.Households.Below.Poverty","Percent.Aged.16.Unemployed")]

Plot Choropleth

So far, we have demonstrated that crime rate (density) increases as one moves closer to both police stations and transit locations, but given the structure and political influence of community organizations, it may prove equally useful to view the problem and any resource deployment recommendations bounded by community borders. The following choropleth displays Chicago communities according to increasing crime “density” as measured by annualized crime rate per square kilometer.

Note: This is an image of the interactive html file available via the project’s Github folder.

#Commented out here as it creates a separate html

#library(plotGoogleMaps)
#plotGoogleMaps(choro,zcol="Density",filename="Crimes_Communities.html",layerName="Density of Crimes", fillOpacity=0.4,strokeWeight=0,mapTypeId="ROADMAP")

Demographics and Other Community Level Characteristics

We chose a number of community level demographics to explore for our final regression models. To summarize, we found that only 4 of the chosen factors were significantly correlated to a community’s crime rate (density). See the table and graphical correlation matrix below:

Note: We will use the number of transit stations accessible to a given community as a proxy for individual crimes’ proximity figures in our regression models. Also, note the co-linearity between “% Households Below the Poverty Line” vs. “% Aged 16 Unemployed” (corr = 0.81) and “#of Transit Stations vs. “Population Density” (corr = 0.48). This will dictate choosing one of each pair.

Note: The density correlation table can be found in the image folder in the Github project.

Plot Density vs. Various Community Characteristics

## 
##  Pearson's product-moment correlation
## 
## data:  Chicago_shp@data$Density and Chicago_shp@data$Num.Close.Transit
## t = 6.323, df = 72, p-value = 1.907e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4273692 0.7267872
## sample estimates:
##       cor 
## 0.5975183
## 
##  Pearson's product-moment correlation
## 
## data:  Chicago_shp@data$Density and Chicago_shp@data$Hardship.Index
## t = 1.0346, df = 72, p-value = 0.3043
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1105222  0.3401254
## sample estimates:
##       cor 
## 0.1210331
## 
##  Pearson's product-moment correlation
## 
## data:  Chicago_shp@data$Density and Chicago_shp@data$Per.Capita.Income
## t = 1.4717, df = 72, p-value = 0.1455
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.0599510  0.3843783
## sample estimates:
##       cor 
## 0.1708887
## 
##  Pearson's product-moment correlation
## 
## data:  Chicago_shp@data$Density and Chicago_shp@data$Percent.of.Housing.Crowded
## t = 0.35381, df = 72, p-value = 0.7245
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1886336  0.2676121
## sample estimates:
##        cor 
## 0.04166083
## 
##  Pearson's product-moment correlation
## 
## data:  Chicago_shp@data$Density and Chicago_shp@data$Percent.Households.Below.Poverty
## t = 3.5473, df = 72, p-value = 0.0006893
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1723994 0.5644557
## sample estimates:
##       cor 
## 0.3857041
## 
##  Pearson's product-moment correlation
## 
## data:  Chicago_shp@data$Density and Chicago_shp@data$Percent.Aged.16.Unemployed
## t = 2.0115, df = 72, p-value = 0.04801
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.002289403 0.436176558
## sample estimates:
##       cor 
## 0.2306675
## 
##  Pearson's product-moment correlation
## 
## data:  Chicago_shp@data$Density and Chicago_shp@data$Percent.Aged.25.Without.Hish.School.Diploma
## t = -0.63253, df = 72, p-value = 0.529
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2977788  0.1568244
## sample estimates:
##         cor 
## -0.07433817
## 
##  Pearson's product-moment correlation
## 
## data:  Chicago_shp@data$Density and Chicago_shp@data$People.Per.km
## t = 4.3812, df = 72, p-value = 3.939e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2572555 0.6220714
## sample estimates:
##       cor 
## 0.4587857

## List of 44
##  $ line                 :List of 4
##   ..$ colour  : chr "black"
##   ..$ size    : num 0.5
##   ..$ linetype: num 1
##   ..$ lineend : chr "butt"
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ rect                 :List of 4
##   ..$ fill    : chr "white"
##   ..$ colour  : chr "black"
##   ..$ size    : num 0.5
##   ..$ linetype: num 1
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ text                 :List of 10
##   ..$ family    : chr ""
##   ..$ face      : chr "plain"
##   ..$ colour    : chr "black"
##   ..$ size      : num 12
##   ..$ hjust     : num 0.5
##   ..$ vjust     : num 0.5
##   ..$ angle     : num 0
##   ..$ lineheight: num 0.9
##   ..$ margin    :Classes 'margin', 'unit'  atomic [1:4] 0 0 0 0
##   .. .. ..- attr(*, "unit")= chr "pt"
##   .. .. ..- attr(*, "valid.unit")= int 8
##   ..$ debug     : logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.line            :List of 4
##   ..$ colour  : NULL
##   ..$ size    : NULL
##   ..$ linetype: NULL
##   ..$ lineend : NULL
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ axis.line.x          : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.line.y          : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.text            :List of 10
##   ..$ family    : NULL
##   ..$ face      : NULL
##   ..$ colour    : chr "grey30"
##   ..$ size      :Class 'rel'  num 0.8
##   ..$ hjust     : NULL
##   ..$ vjust     : NULL
##   ..$ angle     : NULL
##   ..$ lineheight: NULL
##   ..$ margin    : NULL
##   ..$ debug     : NULL
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x          :List of 10
##   ..$ family    : NULL
##   ..$ face      : NULL
##   ..$ colour    : NULL
##   ..$ size      : NULL
##   ..$ hjust     : NULL
##   ..$ vjust     : num 1
##   ..$ angle     : NULL
##   ..$ lineheight: NULL
##   ..$ margin    :Classes 'margin', 'unit'  atomic [1:4] 2.4 0 0 0
##   .. .. ..- attr(*, "unit")= chr "pt"
##   .. .. ..- attr(*, "valid.unit")= int 8
##   ..$ debug     : NULL
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.y          :List of 10
##   ..$ family    : NULL
##   ..$ face      : NULL
##   ..$ colour    : NULL
##   ..$ size      : NULL
##   ..$ hjust     : num 1
##   ..$ vjust     : NULL
##   ..$ angle     : NULL
##   ..$ lineheight: NULL
##   ..$ margin    :Classes 'margin', 'unit'  atomic [1:4] 0 2.4 0 0
##   .. .. ..- attr(*, "unit")= chr "pt"
##   .. .. ..- attr(*, "valid.unit")= int 8
##   ..$ debug     : NULL
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.ticks           :List of 4
##   ..$ colour  : chr "grey70"
##   ..$ size    : num 0.25
##   ..$ linetype: NULL
##   ..$ lineend : NULL
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ axis.ticks.length    :Class 'unit'  atomic [1:1] 3
##   .. ..- attr(*, "unit")= chr "pt"
##   .. ..- attr(*, "valid.unit")= int 8
##  $ axis.title.x         :List of 10
##   ..$ family    : NULL
##   ..$ face      : NULL
##   ..$ colour    : NULL
##   ..$ size      : NULL
##   ..$ hjust     : NULL
##   ..$ vjust     : NULL
##   ..$ angle     : NULL
##   ..$ lineheight: NULL
##   ..$ margin    :Classes 'margin', 'unit'  atomic [1:4] 4.8 0 2.4 0
##   .. .. ..- attr(*, "unit")= chr "pt"
##   .. .. ..- attr(*, "valid.unit")= int 8
##   ..$ debug     : NULL
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.y         :List of 10
##   ..$ family    : NULL
##   ..$ face      : NULL
##   ..$ colour    : NULL
##   ..$ size      : NULL
##   ..$ hjust     : NULL
##   ..$ vjust     : NULL
##   ..$ angle     : num 90
##   ..$ lineheight: NULL
##   ..$ margin    :Classes 'margin', 'unit'  atomic [1:4] 0 4.8 0 2.4
##   .. .. ..- attr(*, "unit")= chr "pt"
##   .. .. ..- attr(*, "valid.unit")= int 8
##   ..$ debug     : NULL
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.background    :List of 4
##   ..$ fill    : NULL
##   ..$ colour  : logi NA
##   ..$ size    : NULL
##   ..$ linetype: NULL
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ legend.margin        :Class 'unit'  atomic [1:1] 0.2
##   .. ..- attr(*, "unit")= chr "cm"
##   .. ..- attr(*, "valid.unit")= int 1
##  $ legend.key           :List of 4
##   ..$ fill    : chr "white"
##   ..$ colour  : chr "grey50"
##   ..$ size    : num 0.25
##   ..$ linetype: NULL
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ legend.key.size      :Class 'unit'  atomic [1:1] 1.2
##   .. ..- attr(*, "unit")= chr "lines"
##   .. ..- attr(*, "valid.unit")= int 3
##  $ legend.key.height    : NULL
##  $ legend.key.width     : NULL
##  $ legend.text          :List of 10
##   ..$ family    : NULL
##   ..$ face      : NULL
##   ..$ colour    : NULL
##   ..$ size      :Class 'rel'  num 0.8
##   ..$ hjust     : NULL
##   ..$ vjust     : NULL
##   ..$ angle     : NULL
##   ..$ lineheight: NULL
##   ..$ margin    : NULL
##   ..$ debug     : NULL
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.text.align    : NULL
##  $ legend.title         :List of 10
##   ..$ family    : NULL
##   ..$ face      : NULL
##   ..$ colour    : NULL
##   ..$ size      : NULL
##   ..$ hjust     : num 0
##   ..$ vjust     : NULL
##   ..$ angle     : NULL
##   ..$ lineheight: NULL
##   ..$ margin    : NULL
##   ..$ debug     : NULL
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.title.align   : NULL
##  $ legend.position      : chr "right"
##  $ legend.direction     : NULL
##  $ legend.justification : chr "center"
##  $ legend.box           : NULL
##  $ panel.background     :List of 4
##   ..$ fill    : chr "white"
##   ..$ colour  : logi NA
##   ..$ size    : NULL
##   ..$ linetype: NULL
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ panel.border         :List of 4
##   ..$ fill    : logi NA
##   ..$ colour  : chr "grey70"
##   ..$ size    : num 0.5
##   ..$ linetype: NULL
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ panel.grid.major     :List of 4
##   ..$ colour  : chr "grey85"
##   ..$ size    : num 0.25
##   ..$ linetype: NULL
##   ..$ lineend : NULL
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.grid.minor     :List of 4
##   ..$ colour  : chr "grey93"
##   ..$ size    : num 0.125
##   ..$ linetype: NULL
##   ..$ lineend : NULL
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.margin         :Class 'unit'  atomic [1:1] 6
##   .. ..- attr(*, "unit")= chr "pt"
##   .. ..- attr(*, "valid.unit")= int 8
##  $ panel.margin.x       : NULL
##  $ panel.margin.y       : NULL
##  $ panel.ontop          : logi FALSE
##  $ strip.background     :List of 4
##   ..$ fill    : chr "grey70"
##   ..$ colour  : logi NA
##   ..$ size    : NULL
##   ..$ linetype: NULL
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ strip.text           :List of 10
##   ..$ family    : NULL
##   ..$ face      : NULL
##   ..$ colour    : chr "grey10"
##   ..$ size      :Class 'rel'  num 0.8
##   ..$ hjust     : NULL
##   ..$ vjust     : NULL
##   ..$ angle     : NULL
##   ..$ lineheight: NULL
##   ..$ margin    : NULL
##   ..$ debug     : NULL
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.text.x         :List of 10
##   ..$ family    : NULL
##   ..$ face      : NULL
##   ..$ colour    : chr "white"
##   ..$ size      : NULL
##   ..$ hjust     : NULL
##   ..$ vjust     : NULL
##   ..$ angle     : NULL
##   ..$ lineheight: NULL
##   ..$ margin    :Classes 'margin', 'unit'  atomic [1:4] 6 0 6 0
##   .. .. ..- attr(*, "unit")= chr "pt"
##   .. .. ..- attr(*, "valid.unit")= int 8
##   ..$ debug     : NULL
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.text.y         :List of 10
##   ..$ family    : NULL
##   ..$ face      : NULL
##   ..$ colour    : chr "white"
##   ..$ size      : NULL
##   ..$ hjust     : NULL
##   ..$ vjust     : NULL
##   ..$ angle     : num -90
##   ..$ lineheight: NULL
##   ..$ margin    :Classes 'margin', 'unit'  atomic [1:4] 0 6 0 6
##   .. .. ..- attr(*, "unit")= chr "pt"
##   .. .. ..- attr(*, "valid.unit")= int 8
##   ..$ debug     : NULL
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.switch.pad.grid:Class 'unit'  atomic [1:1] 0.1
##   .. ..- attr(*, "unit")= chr "cm"
##   .. ..- attr(*, "valid.unit")= int 1
##  $ strip.switch.pad.wrap:Class 'unit'  atomic [1:1] 0.1
##   .. ..- attr(*, "unit")= chr "cm"
##   .. ..- attr(*, "valid.unit")= int 1
##  $ plot.background      :List of 4
##   ..$ fill    : NULL
##   ..$ colour  : chr "white"
##   ..$ size    : NULL
##   ..$ linetype: NULL
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ plot.title           :List of 10
##   ..$ family    : NULL
##   ..$ face      : NULL
##   ..$ colour    : NULL
##   ..$ size      :Class 'rel'  num 1.2
##   ..$ hjust     : NULL
##   ..$ vjust     : NULL
##   ..$ angle     : NULL
##   ..$ lineheight: NULL
##   ..$ margin    :Classes 'margin', 'unit'  atomic [1:4] 0 0 7.2 0
##   .. .. ..- attr(*, "unit")= chr "pt"
##   .. .. ..- attr(*, "valid.unit")= int 8
##   ..$ debug     : NULL
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.margin          :Classes 'margin', 'unit'  atomic [1:4] 6 6 6 6
##   .. ..- attr(*, "unit")= chr "pt"
##   .. ..- attr(*, "valid.unit")= int 8
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi TRUE
##  - attr(*, "validate")= logi TRUE

Linear Models

Given each of the above factors, we chose to model the linear relationships between a given communities overall crime count and crime density in separate linear models against those factors from the above analysis that seem most significant taking co-linearity into account. These are Hour Category, the # of Transit Stations in Close Proximity to a Given Community, % Households Below the Poverty Level, and Minimum Distance to a transit Station (as a proxy for access to any public transportation).

Model Against Crime Counts

A model based on these few factors can explain nearly 50% of the monthly variance in community crime counts. Note: Minimum Distance is not a significant factor in this model.

## 
## Call:
## lm(formula = Count ~ Hour.Category + Num.Close.Transit + Percent.Households.Below.Poverty + 
##     Minimum.Transit.Dist, data = crime_by_community3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -755.23 -161.67  -56.18  111.33 1403.43 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      -162.091     32.580  -4.975 7.84e-07 ***
## Hour.CategoryAM Commute           168.201     31.376   5.361 1.06e-07 ***
## Hour.CategoryLunch                291.159     31.375   9.280  < 2e-16 ***
## Hour.CategoryPM Commute           348.592     31.377  11.110  < 2e-16 ***
## Hour.CategoryEvening              292.607     31.375   9.326  < 2e-16 ***
## Hour.CategoryNight                163.950     31.375   5.225 2.17e-07 ***
## Num.Close.Transit                  24.015      1.413  16.990  < 2e-16 ***
## Percent.Households.Below.Poverty    7.131      0.785   9.084  < 2e-16 ***
## Minimum.Transit.Dist               -7.464      5.812  -1.284    0.199    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 269.9 on 879 degrees of freedom
##   (55 observations deleted due to missingness)
## Multiple R-squared:  0.4136, Adjusted R-squared:  0.4083 
## F-statistic:  77.5 on 8 and 879 DF,  p-value: < 2.2e-16

I.e. within a given year and hour category, the above variables can explain 41% of the crime count variance by community.

Model Against Crime Density

Another model based on the same factors can explain 51% of the annual variance in community crime density.

## 
## Call:
## lm(formula = Density ~ Hour.Category + Num.Close.Transit + Percent.Households.Below.Poverty + 
##     Minimum.Transit.Dist, data = crime_by_community4)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -68.22 -16.76  -1.79  13.92 229.05 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      -21.06111    3.65400  -5.764 1.14e-08 ***
## Hour.CategoryAM Commute           24.42560    3.51890   6.941 7.55e-12 ***
## Hour.CategoryLunch                42.76953    3.51888  12.154  < 2e-16 ***
## Hour.CategoryPM Commute           50.40680    3.51899  14.324  < 2e-16 ***
## Hour.CategoryEvening              41.45542    3.51885  11.781  < 2e-16 ***
## Hour.CategoryNight                22.28764    3.51884   6.334 3.81e-10 ***
## Num.Close.Transit                  2.77874    0.15852  17.529  < 2e-16 ***
## Percent.Households.Below.Poverty   1.19176    0.08804  13.536  < 2e-16 ***
## Minimum.Transit.Dist              -2.81023    0.65185  -4.311 1.81e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.27 on 879 degrees of freedom
##   (55 observations deleted due to missingness)
## Multiple R-squared:  0.5165, Adjusted R-squared:  0.5121 
## F-statistic: 117.4 on 8 and 879 DF,  p-value: < 2.2e-16

Extra Credit

Relationships Between Localized Transit Station Crime rates and PD distance

I did the following test to see if transit stations closer to PD’s had lower crime rates. They do not.

#Create data frame by Station Name
Mass_Transit_Closest_PD <- subset(Mass_Transit,PD.Dist.km < 2) %>%
  group_by(STATION_NAME, Transit.index, STOP_NAME) %>%
  summarise(Closest.PD.Dist = median(PD.Dist.km))

Mass_Transit_Closest_PD <- rename(Mass_Transit_Closest_PD, Station.Name = STATION_NAME)
            
km_to_Transit_Crimes <- subset(crime_subset_df, Transit.Dist.km < 2) %>%
  group_by(Transit.index) %>%
  summarise(Count = n())

#Join Station Proximity Data 
Mass_Transit_Closest_PD <- left_join(Mass_Transit_Closest_PD, km_to_Transit_Crimes, by = "Transit.index")
Mass_Transit_Closest_PD$Station.Name <- as.factor(Mass_Transit_Closest_PD$Station.Name)

# Transit Proximity to PD Effects on Crime less Loop outliers.
g12 <- ggplot(data = Mass_Transit_Closest_PD, aes(y = Count, x = Closest.PD.Dist, color = Count)) +
  geom_point(shape = 19, size = 4) +
  scale_color_gradient(low = "lightblue", high = "red") +
  scale_y_continuous(labels=comma) + 
  scale_x_continuous(labels=comma) + 
  geom_smooth(method = "lm") +
  labs(title = paste('Crime frequency in 1.75 km Radius Around Transit Station vs Station Distance from Police Department'),x = "Distance to Closest PD", y = "Count of Crime") + 
  theme_light()

#pdf('Images/g6.pdf')
g12

#dev.off()

cor.test(Mass_Transit_Closest_PD$Count,Mass_Transit_Closest_PD$Closest.PD.Dist)
## 
##  Pearson's product-moment correlation
## 
## data:  Mass_Transit_Closest_PD$Count and Mass_Transit_Closest_PD$Closest.PD.Dist
## t = -1.4982, df = 161, p-value = 0.136
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2661814  0.0371309
## sample estimates:
##        cor 
## -0.1172589

Conclusion

Infrastructure Proposal

Based on this analysis, it would appear that crime density actually increases as one moves closer to both police departments and transit stations. With regard to police departments, this bias may simply reflect that police are more likely to make an arrest for a crime where they most regularly patrol.
While it may not have the hypothesized benefit of reducing crime rates, placing police departments near transit stations will still give police the best chance to affect crime in their areas. As such, the city may still be best served by building small, permanent outposts at its most affected transit stations as opposed to a single, large department. This strategy could reduce costs while having a targeted effect on demonstrated hot spots. Regular assignments to these posts could have the added benefit of improving community relations and participation through personal familiarity. I would recommend running a few trials in the densest communities to determine effect. Per the choropleth and bar graphs in the community section of this paper, I would recommend running separate trials in West and East Garfield for Violent Crimes vs. the Loop and the Near North Side for Property Crimes. Trials should help to isolate divergent effects on Violent Crimes vs. Non-Violent Property Crime.

Proposal for further analysis

As has been reported in recent (spring of 2015) news, Chicago is “off to (its) deadliest start in nearly two decades.” Homicides and shootings are up by more than 70% year over year through the first quarter, while arrests and investigative stops have fallen (follow the link for the source). The reasons for those shifts are beyond the scope of this analysis. However, a time lapse analysis could shed some light on growth patterns. If the client wishes to move forward, I would suggest the following:

  • Add additional transit station locations and volume estimates for each location. E.g. bus routes
  • Add Police Station employee data
  • Add additional community and/or demographic data
  • Perform temporal analysis showing acceleration/deceleration of crime rates per community to understand trends.

Temporal (Schedule) Conclusions

Other than relatively weak, weekend effects, patrol schedules do not need to adjust for day of week volumes. There are, however, strong shifts in hourly volume and staff should be scheduled accordingly.

Proposal for further analysis

As a follow up to this analysis, I would suggest the following:

  • Perform an Erlang-C style analysis to determine workforce requirements on an hourly basis per police district.
  • Examine Holiday effects on crime.
  • Expand prediction models to include other variables that could assist with scheduling decisions such as weather.